here() starts at /Users/svendemaeyer/Library/CloudStorage/OneDrive-UniversiteitAntwerpen/Praatjes/Workshops/Zurich_2026/ABAR_Zurich26_Web
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Loading required package: Rcpp
Loading 'brms' package (version 2.22.0). Useful instructions
can be found by typing help('brms'). A more detailed introduction
to the package is available through vignette('brms_overview').
Attaching package: 'brms'
The following object is masked from 'package:stats':
ar
This is bayesplot version 1.12.0
- Online documentation and vignettes at mc-stan.org/bayesplot
- bayesplot theme set to bayesplot::theme_default()
* Does _not_ affect other ggplot2 plots
* See ?bayesplot_theme_set for details on theme setting
Attaching package: 'bayesplot'
The following object is masked from 'package:brms':
rhat
Registered S3 method overwritten by 'GGally':
method from
+.gg ggplot2
Your Turn
Open MarathonData.RData
Estimate your first Bayesian Models
Dependent variable: MarathonTimeM
Model1: only an intercept
Model2: introduce the effect of km4week and sp4week on MarathonTimeM
Change the brms defaults (4 chains of 6000 iterations)
Make plots with the plot() function
What do we learn? (rough interpretation)
Note: I centred both km4weekand sp4week around their mean!
Results Exercise
Model1: only an intercept (brms defaults)
load (file = here (
"Presentations" ,
"MarathonData.RData" )
)
Mod_MT1 <- brm (
MarathonTimeM ~ 1 ,
data = MarathonData,
backend = "cmdstanr" ,
seed = 1975
)
Running MCMC with 4 sequential chains...
Chain 1 Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 1 Iteration: 100 / 2000 [ 5%] (Warmup)
Chain 1 Iteration: 200 / 2000 [ 10%] (Warmup)
Chain 1 Iteration: 300 / 2000 [ 15%] (Warmup)
Chain 1 Iteration: 400 / 2000 [ 20%] (Warmup)
Chain 1 Iteration: 500 / 2000 [ 25%] (Warmup)
Chain 1 Iteration: 600 / 2000 [ 30%] (Warmup)
Chain 1 Iteration: 700 / 2000 [ 35%] (Warmup)
Chain 1 Iteration: 800 / 2000 [ 40%] (Warmup)
Chain 1 Iteration: 900 / 2000 [ 45%] (Warmup)
Chain 1 Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 1 Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 1 Iteration: 1100 / 2000 [ 55%] (Sampling)
Chain 1 Iteration: 1200 / 2000 [ 60%] (Sampling)
Chain 1 Iteration: 1300 / 2000 [ 65%] (Sampling)
Chain 1 Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain 1 Iteration: 1500 / 2000 [ 75%] (Sampling)
Chain 1 Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain 1 Iteration: 1700 / 2000 [ 85%] (Sampling)
Chain 1 Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain 1 Iteration: 1900 / 2000 [ 95%] (Sampling)
Chain 1 Iteration: 2000 / 2000 [100%] (Sampling)
Chain 1 finished in 0.0 seconds.
Chain 2 Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 2 Iteration: 100 / 2000 [ 5%] (Warmup)
Chain 2 Iteration: 200 / 2000 [ 10%] (Warmup)
Chain 2 Iteration: 300 / 2000 [ 15%] (Warmup)
Chain 2 Iteration: 400 / 2000 [ 20%] (Warmup)
Chain 2 Iteration: 500 / 2000 [ 25%] (Warmup)
Chain 2 Iteration: 600 / 2000 [ 30%] (Warmup)
Chain 2 Iteration: 700 / 2000 [ 35%] (Warmup)
Chain 2 Iteration: 800 / 2000 [ 40%] (Warmup)
Chain 2 Iteration: 900 / 2000 [ 45%] (Warmup)
Chain 2 Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 2 Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 2 Iteration: 1100 / 2000 [ 55%] (Sampling)
Chain 2 Iteration: 1200 / 2000 [ 60%] (Sampling)
Chain 2 Iteration: 1300 / 2000 [ 65%] (Sampling)
Chain 2 Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain 2 Iteration: 1500 / 2000 [ 75%] (Sampling)
Chain 2 Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain 2 Iteration: 1700 / 2000 [ 85%] (Sampling)
Chain 2 Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain 2 Iteration: 1900 / 2000 [ 95%] (Sampling)
Chain 2 Iteration: 2000 / 2000 [100%] (Sampling)
Chain 2 finished in 0.0 seconds.
Chain 3 Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 3 Iteration: 100 / 2000 [ 5%] (Warmup)
Chain 3 Iteration: 200 / 2000 [ 10%] (Warmup)
Chain 3 Iteration: 300 / 2000 [ 15%] (Warmup)
Chain 3 Iteration: 400 / 2000 [ 20%] (Warmup)
Chain 3 Iteration: 500 / 2000 [ 25%] (Warmup)
Chain 3 Iteration: 600 / 2000 [ 30%] (Warmup)
Chain 3 Iteration: 700 / 2000 [ 35%] (Warmup)
Chain 3 Iteration: 800 / 2000 [ 40%] (Warmup)
Chain 3 Iteration: 900 / 2000 [ 45%] (Warmup)
Chain 3 Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 3 Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 3 Iteration: 1100 / 2000 [ 55%] (Sampling)
Chain 3 Iteration: 1200 / 2000 [ 60%] (Sampling)
Chain 3 Iteration: 1300 / 2000 [ 65%] (Sampling)
Chain 3 Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain 3 Iteration: 1500 / 2000 [ 75%] (Sampling)
Chain 3 Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain 3 Iteration: 1700 / 2000 [ 85%] (Sampling)
Chain 3 Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain 3 Iteration: 1900 / 2000 [ 95%] (Sampling)
Chain 3 Iteration: 2000 / 2000 [100%] (Sampling)
Chain 3 finished in 0.0 seconds.
Chain 4 Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 4 Iteration: 100 / 2000 [ 5%] (Warmup)
Chain 4 Iteration: 200 / 2000 [ 10%] (Warmup)
Chain 4 Iteration: 300 / 2000 [ 15%] (Warmup)
Chain 4 Iteration: 400 / 2000 [ 20%] (Warmup)
Chain 4 Iteration: 500 / 2000 [ 25%] (Warmup)
Chain 4 Iteration: 600 / 2000 [ 30%] (Warmup)
Chain 4 Iteration: 700 / 2000 [ 35%] (Warmup)
Chain 4 Iteration: 800 / 2000 [ 40%] (Warmup)
Chain 4 Iteration: 900 / 2000 [ 45%] (Warmup)
Chain 4 Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 4 Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 4 Iteration: 1100 / 2000 [ 55%] (Sampling)
Chain 4 Iteration: 1200 / 2000 [ 60%] (Sampling)
Chain 4 Iteration: 1300 / 2000 [ 65%] (Sampling)
Chain 4 Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain 4 Iteration: 1500 / 2000 [ 75%] (Sampling)
Chain 4 Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain 4 Iteration: 1700 / 2000 [ 85%] (Sampling)
Chain 4 Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain 4 Iteration: 1900 / 2000 [ 95%] (Sampling)
Chain 4 Iteration: 2000 / 2000 [100%] (Sampling)
Chain 4 finished in 0.0 seconds.
All 4 chains finished successfully.
Mean chain execution time: 0.0 seconds.
Total execution time: 0.5 seconds.
Results Exercise
Model1: only an intercept (brms defaults)
Family: gaussian
Links: mu = identity; sigma = identity
Formula: MarathonTimeM ~ 1
Data: MarathonData (Number of observations: 87)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 199.12 2.42 194.38 203.76 1.00 2903 2125
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 22.84 1.73 19.72 26.39 1.00 3314 2946
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Results Exercise
Model1: only an intercept (brms defaults)
Results Exercise
Model2: add the effect of km4week and sp4week
Mod_MT2 <- brm (
MarathonTimeM ~ 1 + km4week + sp4week,
data = MarathonData,
backend = "cmdstanr" ,
seed = 1975
)
Running MCMC with 4 sequential chains...
Chain 1 Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 1 Iteration: 100 / 2000 [ 5%] (Warmup)
Chain 1 Iteration: 200 / 2000 [ 10%] (Warmup)
Chain 1 Iteration: 300 / 2000 [ 15%] (Warmup)
Chain 1 Iteration: 400 / 2000 [ 20%] (Warmup)
Chain 1 Iteration: 500 / 2000 [ 25%] (Warmup)
Chain 1 Iteration: 600 / 2000 [ 30%] (Warmup)
Chain 1 Iteration: 700 / 2000 [ 35%] (Warmup)
Chain 1 Iteration: 800 / 2000 [ 40%] (Warmup)
Chain 1 Iteration: 900 / 2000 [ 45%] (Warmup)
Chain 1 Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 1 Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 1 Iteration: 1100 / 2000 [ 55%] (Sampling)
Chain 1 Iteration: 1200 / 2000 [ 60%] (Sampling)
Chain 1 Iteration: 1300 / 2000 [ 65%] (Sampling)
Chain 1 Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain 1 Iteration: 1500 / 2000 [ 75%] (Sampling)
Chain 1 Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain 1 Iteration: 1700 / 2000 [ 85%] (Sampling)
Chain 1 Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain 1 Iteration: 1900 / 2000 [ 95%] (Sampling)
Chain 1 Iteration: 2000 / 2000 [100%] (Sampling)
Chain 1 finished in 0.0 seconds.
Chain 2 Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 2 Iteration: 100 / 2000 [ 5%] (Warmup)
Chain 2 Iteration: 200 / 2000 [ 10%] (Warmup)
Chain 2 Iteration: 300 / 2000 [ 15%] (Warmup)
Chain 2 Iteration: 400 / 2000 [ 20%] (Warmup)
Chain 2 Iteration: 500 / 2000 [ 25%] (Warmup)
Chain 2 Iteration: 600 / 2000 [ 30%] (Warmup)
Chain 2 Iteration: 700 / 2000 [ 35%] (Warmup)
Chain 2 Iteration: 800 / 2000 [ 40%] (Warmup)
Chain 2 Iteration: 900 / 2000 [ 45%] (Warmup)
Chain 2 Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 2 Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 2 Iteration: 1100 / 2000 [ 55%] (Sampling)
Chain 2 Iteration: 1200 / 2000 [ 60%] (Sampling)
Chain 2 Iteration: 1300 / 2000 [ 65%] (Sampling)
Chain 2 Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain 2 Iteration: 1500 / 2000 [ 75%] (Sampling)
Chain 2 Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain 2 Iteration: 1700 / 2000 [ 85%] (Sampling)
Chain 2 Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain 2 Iteration: 1900 / 2000 [ 95%] (Sampling)
Chain 2 Iteration: 2000 / 2000 [100%] (Sampling)
Chain 2 finished in 0.0 seconds.
Chain 3 Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 3 Iteration: 100 / 2000 [ 5%] (Warmup)
Chain 3 Iteration: 200 / 2000 [ 10%] (Warmup)
Chain 3 Iteration: 300 / 2000 [ 15%] (Warmup)
Chain 3 Iteration: 400 / 2000 [ 20%] (Warmup)
Chain 3 Iteration: 500 / 2000 [ 25%] (Warmup)
Chain 3 Iteration: 600 / 2000 [ 30%] (Warmup)
Chain 3 Iteration: 700 / 2000 [ 35%] (Warmup)
Chain 3 Iteration: 800 / 2000 [ 40%] (Warmup)
Chain 3 Iteration: 900 / 2000 [ 45%] (Warmup)
Chain 3 Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 3 Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 3 Iteration: 1100 / 2000 [ 55%] (Sampling)
Chain 3 Iteration: 1200 / 2000 [ 60%] (Sampling)
Chain 3 Iteration: 1300 / 2000 [ 65%] (Sampling)
Chain 3 Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain 3 Iteration: 1500 / 2000 [ 75%] (Sampling)
Chain 3 Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain 3 Iteration: 1700 / 2000 [ 85%] (Sampling)
Chain 3 Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain 3 Iteration: 1900 / 2000 [ 95%] (Sampling)
Chain 3 Iteration: 2000 / 2000 [100%] (Sampling)
Chain 3 finished in 0.0 seconds.
Chain 4 Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 4 Iteration: 100 / 2000 [ 5%] (Warmup)
Chain 4 Iteration: 200 / 2000 [ 10%] (Warmup)
Chain 4 Iteration: 300 / 2000 [ 15%] (Warmup)
Chain 4 Iteration: 400 / 2000 [ 20%] (Warmup)
Chain 4 Iteration: 500 / 2000 [ 25%] (Warmup)
Chain 4 Iteration: 600 / 2000 [ 30%] (Warmup)
Chain 4 Iteration: 700 / 2000 [ 35%] (Warmup)
Chain 4 Iteration: 800 / 2000 [ 40%] (Warmup)
Chain 4 Iteration: 900 / 2000 [ 45%] (Warmup)
Chain 4 Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 4 Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 4 Iteration: 1100 / 2000 [ 55%] (Sampling)
Chain 4 Iteration: 1200 / 2000 [ 60%] (Sampling)
Chain 4 Iteration: 1300 / 2000 [ 65%] (Sampling)
Chain 4 Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain 4 Iteration: 1500 / 2000 [ 75%] (Sampling)
Chain 4 Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain 4 Iteration: 1700 / 2000 [ 85%] (Sampling)
Chain 4 Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain 4 Iteration: 1900 / 2000 [ 95%] (Sampling)
Chain 4 Iteration: 2000 / 2000 [100%] (Sampling)
Chain 4 finished in 0.0 seconds.
All 4 chains finished successfully.
Mean chain execution time: 0.0 seconds.
Total execution time: 0.5 seconds.
Results Exercise
Model2: add the effect of km4week and sp4week
Family: gaussian
Links: mu = identity; sigma = identity
Formula: MarathonTimeM ~ 1 + km4week + sp4week
Data: MarathonData (Number of observations: 87)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 199.20 1.50 196.28 202.15 1.00 3852 2838
km4week -0.42 0.06 -0.53 -0.31 1.00 4189 3157
sp4week -9.98 1.30 -12.46 -7.50 1.00 4402 3263
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 13.94 1.09 11.96 16.29 1.00 3991 3236
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Results Exercise
Model2: add the effect of km4week and sp4week
Results Exercise
Model3: change the brms defaults
Mod_MT3 <- brm (
MarathonTimeM ~ 1 + km4week + sp4week,
data = MarathonData,
chains = 4 ,
iter = 4000 ,
cores = 4 ,
backend = "cmdstanr" ,
seed = 1975
)
Running MCMC with 4 parallel chains...
Chain 1 Iteration: 1 / 4000 [ 0%] (Warmup)
Chain 1 Iteration: 100 / 4000 [ 2%] (Warmup)
Chain 1 Iteration: 200 / 4000 [ 5%] (Warmup)
Chain 1 Iteration: 300 / 4000 [ 7%] (Warmup)
Chain 1 Iteration: 400 / 4000 [ 10%] (Warmup)
Chain 1 Iteration: 500 / 4000 [ 12%] (Warmup)
Chain 1 Iteration: 600 / 4000 [ 15%] (Warmup)
Chain 1 Iteration: 700 / 4000 [ 17%] (Warmup)
Chain 1 Iteration: 800 / 4000 [ 20%] (Warmup)
Chain 1 Iteration: 900 / 4000 [ 22%] (Warmup)
Chain 1 Iteration: 1000 / 4000 [ 25%] (Warmup)
Chain 1 Iteration: 1100 / 4000 [ 27%] (Warmup)
Chain 1 Iteration: 1200 / 4000 [ 30%] (Warmup)
Chain 1 Iteration: 1300 / 4000 [ 32%] (Warmup)
Chain 1 Iteration: 1400 / 4000 [ 35%] (Warmup)
Chain 1 Iteration: 1500 / 4000 [ 37%] (Warmup)
Chain 1 Iteration: 1600 / 4000 [ 40%] (Warmup)
Chain 1 Iteration: 1700 / 4000 [ 42%] (Warmup)
Chain 1 Iteration: 1800 / 4000 [ 45%] (Warmup)
Chain 1 Iteration: 1900 / 4000 [ 47%] (Warmup)
Chain 1 Iteration: 2000 / 4000 [ 50%] (Warmup)
Chain 1 Iteration: 2001 / 4000 [ 50%] (Sampling)
Chain 1 Iteration: 2100 / 4000 [ 52%] (Sampling)
Chain 1 Iteration: 2200 / 4000 [ 55%] (Sampling)
Chain 1 Iteration: 2300 / 4000 [ 57%] (Sampling)
Chain 1 Iteration: 2400 / 4000 [ 60%] (Sampling)
Chain 1 Iteration: 2500 / 4000 [ 62%] (Sampling)
Chain 1 Iteration: 2600 / 4000 [ 65%] (Sampling)
Chain 1 Iteration: 2700 / 4000 [ 67%] (Sampling)
Chain 1 Iteration: 2800 / 4000 [ 70%] (Sampling)
Chain 1 Iteration: 2900 / 4000 [ 72%] (Sampling)
Chain 1 Iteration: 3000 / 4000 [ 75%] (Sampling)
Chain 1 Iteration: 3100 / 4000 [ 77%] (Sampling)
Chain 1 Iteration: 3200 / 4000 [ 80%] (Sampling)
Chain 1 Iteration: 3300 / 4000 [ 82%] (Sampling)
Chain 1 Iteration: 3400 / 4000 [ 85%] (Sampling)
Chain 1 Iteration: 3500 / 4000 [ 87%] (Sampling)
Chain 1 Iteration: 3600 / 4000 [ 90%] (Sampling)
Chain 1 Iteration: 3700 / 4000 [ 92%] (Sampling)
Chain 1 Iteration: 3800 / 4000 [ 95%] (Sampling)
Chain 1 Iteration: 3900 / 4000 [ 97%] (Sampling)
Chain 1 Iteration: 4000 / 4000 [100%] (Sampling)
Chain 2 Iteration: 1 / 4000 [ 0%] (Warmup)
Chain 2 Iteration: 100 / 4000 [ 2%] (Warmup)
Chain 2 Iteration: 200 / 4000 [ 5%] (Warmup)
Chain 2 Iteration: 300 / 4000 [ 7%] (Warmup)
Chain 2 Iteration: 400 / 4000 [ 10%] (Warmup)
Chain 2 Iteration: 500 / 4000 [ 12%] (Warmup)
Chain 2 Iteration: 600 / 4000 [ 15%] (Warmup)
Chain 2 Iteration: 700 / 4000 [ 17%] (Warmup)
Chain 2 Iteration: 800 / 4000 [ 20%] (Warmup)
Chain 2 Iteration: 900 / 4000 [ 22%] (Warmup)
Chain 2 Iteration: 1000 / 4000 [ 25%] (Warmup)
Chain 2 Iteration: 1100 / 4000 [ 27%] (Warmup)
Chain 2 Iteration: 1200 / 4000 [ 30%] (Warmup)
Chain 2 Iteration: 1300 / 4000 [ 32%] (Warmup)
Chain 2 Iteration: 1400 / 4000 [ 35%] (Warmup)
Chain 2 Iteration: 1500 / 4000 [ 37%] (Warmup)
Chain 2 Iteration: 1600 / 4000 [ 40%] (Warmup)
Chain 2 Iteration: 1700 / 4000 [ 42%] (Warmup)
Chain 2 Iteration: 1800 / 4000 [ 45%] (Warmup)
Chain 2 Iteration: 1900 / 4000 [ 47%] (Warmup)
Chain 2 Iteration: 2000 / 4000 [ 50%] (Warmup)
Chain 2 Iteration: 2001 / 4000 [ 50%] (Sampling)
Chain 2 Iteration: 2100 / 4000 [ 52%] (Sampling)
Chain 2 Iteration: 2200 / 4000 [ 55%] (Sampling)
Chain 2 Iteration: 2300 / 4000 [ 57%] (Sampling)
Chain 2 Iteration: 2400 / 4000 [ 60%] (Sampling)
Chain 2 Iteration: 2500 / 4000 [ 62%] (Sampling)
Chain 2 Iteration: 2600 / 4000 [ 65%] (Sampling)
Chain 2 Iteration: 2700 / 4000 [ 67%] (Sampling)
Chain 2 Iteration: 2800 / 4000 [ 70%] (Sampling)
Chain 2 Iteration: 2900 / 4000 [ 72%] (Sampling)
Chain 2 Iteration: 3000 / 4000 [ 75%] (Sampling)
Chain 2 Iteration: 3100 / 4000 [ 77%] (Sampling)
Chain 2 Iteration: 3200 / 4000 [ 80%] (Sampling)
Chain 2 Iteration: 3300 / 4000 [ 82%] (Sampling)
Chain 2 Iteration: 3400 / 4000 [ 85%] (Sampling)
Chain 2 Iteration: 3500 / 4000 [ 87%] (Sampling)
Chain 2 Iteration: 3600 / 4000 [ 90%] (Sampling)
Chain 2 Iteration: 3700 / 4000 [ 92%] (Sampling)
Chain 2 Iteration: 3800 / 4000 [ 95%] (Sampling)
Chain 2 Iteration: 3900 / 4000 [ 97%] (Sampling)
Chain 2 Iteration: 4000 / 4000 [100%] (Sampling)
Chain 3 Iteration: 1 / 4000 [ 0%] (Warmup)
Chain 3 Iteration: 100 / 4000 [ 2%] (Warmup)
Chain 3 Iteration: 200 / 4000 [ 5%] (Warmup)
Chain 3 Iteration: 300 / 4000 [ 7%] (Warmup)
Chain 3 Iteration: 400 / 4000 [ 10%] (Warmup)
Chain 3 Iteration: 500 / 4000 [ 12%] (Warmup)
Chain 3 Iteration: 600 / 4000 [ 15%] (Warmup)
Chain 3 Iteration: 700 / 4000 [ 17%] (Warmup)
Chain 3 Iteration: 800 / 4000 [ 20%] (Warmup)
Chain 3 Iteration: 900 / 4000 [ 22%] (Warmup)
Chain 3 Iteration: 1000 / 4000 [ 25%] (Warmup)
Chain 3 Iteration: 1100 / 4000 [ 27%] (Warmup)
Chain 3 Iteration: 1200 / 4000 [ 30%] (Warmup)
Chain 3 Iteration: 1300 / 4000 [ 32%] (Warmup)
Chain 3 Iteration: 1400 / 4000 [ 35%] (Warmup)
Chain 3 Iteration: 1500 / 4000 [ 37%] (Warmup)
Chain 3 Iteration: 1600 / 4000 [ 40%] (Warmup)
Chain 3 Iteration: 1700 / 4000 [ 42%] (Warmup)
Chain 3 Iteration: 1800 / 4000 [ 45%] (Warmup)
Chain 3 Iteration: 1900 / 4000 [ 47%] (Warmup)
Chain 3 Iteration: 2000 / 4000 [ 50%] (Warmup)
Chain 3 Iteration: 2001 / 4000 [ 50%] (Sampling)
Chain 3 Iteration: 2100 / 4000 [ 52%] (Sampling)
Chain 3 Iteration: 2200 / 4000 [ 55%] (Sampling)
Chain 3 Iteration: 2300 / 4000 [ 57%] (Sampling)
Chain 3 Iteration: 2400 / 4000 [ 60%] (Sampling)
Chain 3 Iteration: 2500 / 4000 [ 62%] (Sampling)
Chain 3 Iteration: 2600 / 4000 [ 65%] (Sampling)
Chain 3 Iteration: 2700 / 4000 [ 67%] (Sampling)
Chain 3 Iteration: 2800 / 4000 [ 70%] (Sampling)
Chain 3 Iteration: 2900 / 4000 [ 72%] (Sampling)
Chain 3 Iteration: 3000 / 4000 [ 75%] (Sampling)
Chain 3 Iteration: 3100 / 4000 [ 77%] (Sampling)
Chain 3 Iteration: 3200 / 4000 [ 80%] (Sampling)
Chain 3 Iteration: 3300 / 4000 [ 82%] (Sampling)
Chain 3 Iteration: 3400 / 4000 [ 85%] (Sampling)
Chain 3 Iteration: 3500 / 4000 [ 87%] (Sampling)
Chain 3 Iteration: 3600 / 4000 [ 90%] (Sampling)
Chain 3 Iteration: 3700 / 4000 [ 92%] (Sampling)
Chain 3 Iteration: 3800 / 4000 [ 95%] (Sampling)
Chain 3 Iteration: 3900 / 4000 [ 97%] (Sampling)
Chain 3 Iteration: 4000 / 4000 [100%] (Sampling)
Chain 4 Iteration: 1 / 4000 [ 0%] (Warmup)
Chain 4 Iteration: 100 / 4000 [ 2%] (Warmup)
Chain 4 Iteration: 200 / 4000 [ 5%] (Warmup)
Chain 4 Iteration: 300 / 4000 [ 7%] (Warmup)
Chain 4 Iteration: 400 / 4000 [ 10%] (Warmup)
Chain 4 Iteration: 500 / 4000 [ 12%] (Warmup)
Chain 4 Iteration: 600 / 4000 [ 15%] (Warmup)
Chain 4 Iteration: 700 / 4000 [ 17%] (Warmup)
Chain 4 Iteration: 800 / 4000 [ 20%] (Warmup)
Chain 4 Iteration: 900 / 4000 [ 22%] (Warmup)
Chain 4 Iteration: 1000 / 4000 [ 25%] (Warmup)
Chain 4 Iteration: 1100 / 4000 [ 27%] (Warmup)
Chain 4 Iteration: 1200 / 4000 [ 30%] (Warmup)
Chain 4 Iteration: 1300 / 4000 [ 32%] (Warmup)
Chain 4 Iteration: 1400 / 4000 [ 35%] (Warmup)
Chain 4 Iteration: 1500 / 4000 [ 37%] (Warmup)
Chain 4 Iteration: 1600 / 4000 [ 40%] (Warmup)
Chain 4 Iteration: 1700 / 4000 [ 42%] (Warmup)
Chain 4 Iteration: 1800 / 4000 [ 45%] (Warmup)
Chain 4 Iteration: 1900 / 4000 [ 47%] (Warmup)
Chain 4 Iteration: 2000 / 4000 [ 50%] (Warmup)
Chain 4 Iteration: 2001 / 4000 [ 50%] (Sampling)
Chain 4 Iteration: 2100 / 4000 [ 52%] (Sampling)
Chain 4 Iteration: 2200 / 4000 [ 55%] (Sampling)
Chain 4 Iteration: 2300 / 4000 [ 57%] (Sampling)
Chain 4 Iteration: 2400 / 4000 [ 60%] (Sampling)
Chain 4 Iteration: 2500 / 4000 [ 62%] (Sampling)
Chain 4 Iteration: 2600 / 4000 [ 65%] (Sampling)
Chain 4 Iteration: 2700 / 4000 [ 67%] (Sampling)
Chain 4 Iteration: 2800 / 4000 [ 70%] (Sampling)
Chain 4 Iteration: 2900 / 4000 [ 72%] (Sampling)
Chain 4 Iteration: 3000 / 4000 [ 75%] (Sampling)
Chain 4 Iteration: 3100 / 4000 [ 77%] (Sampling)
Chain 4 Iteration: 3200 / 4000 [ 80%] (Sampling)
Chain 4 Iteration: 3300 / 4000 [ 82%] (Sampling)
Chain 4 Iteration: 3400 / 4000 [ 85%] (Sampling)
Chain 4 Iteration: 3500 / 4000 [ 87%] (Sampling)
Chain 4 Iteration: 3600 / 4000 [ 90%] (Sampling)
Chain 4 Iteration: 3700 / 4000 [ 92%] (Sampling)
Chain 4 Iteration: 3800 / 4000 [ 95%] (Sampling)
Chain 4 Iteration: 3900 / 4000 [ 97%] (Sampling)
Chain 4 Iteration: 4000 / 4000 [100%] (Sampling)
Chain 1 finished in 0.0 seconds.
Chain 2 finished in 0.0 seconds.
Chain 3 finished in 0.0 seconds.
Chain 4 finished in 0.0 seconds.
All 4 chains finished successfully.
Mean chain execution time: 0.0 seconds.
Total execution time: 0.2 seconds.
Results Exercise
Model3: change the brms defaults
Family: gaussian
Links: mu = identity; sigma = identity
Formula: MarathonTimeM ~ 1 + km4week + sp4week
Data: MarathonData (Number of observations: 87)
Draws: 4 chains, each with iter = 4000; warmup = 2000; thin = 1;
total post-warmup draws = 8000
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 199.15 1.51 196.21 202.14 1.00 8054 5568
km4week -0.42 0.06 -0.53 -0.31 1.00 8127 5617
sp4week -9.98 1.29 -12.50 -7.48 1.00 8080 6304
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 13.92 1.07 12.02 16.22 1.00 7672 5845
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Results Exercise
Model3: change the brms defaults
About convergence
\(\widehat R\) < 1.015 for each parameter estimate
at least 4 chains are recommended
Effective Sample Size (ESS) > 400 to rely on \(\widehat R\)
But is it a good model?
Two complementary procedures:
Posterior-predictive check
A visual check that can be performed with pp_check() from brms
pp_check (Mod_MT2) + theme_minimal ()
Using 10 posterior draws for ppc type 'dens_overlay' by default.
Model comparison with loo cross-validation
\(\sim\) AIC or BIC in Frequentist statistics
\(\widehat{elpd}\) : “expected log predictive density” (higher \(\widehat{elpd}\) implies better model fit without being sensitive for overfitting!)
loo_Mod1 <- loo (MarathonTimes_Mod1)
loo_Mod2 <- loo (MarathonTimes_Mod2)
Warning: Found 1 observations with a pareto_k > 0.7 in model
'MarathonTimes_Mod2'. We recommend to set 'moment_match = TRUE' in order to
perform moment matching for problematic observations.
Comparison<-
loo_compare (
loo_Mod1,
loo_Mod2
)
print (Comparison, simplify = F)
elpd_diff se_diff elpd_loo se_elpd_loo p_loo se_p_loo
MarathonTimes_Mod2 0.0 0.0 -356.3 12.0 7.2 4.4
MarathonTimes_Mod1 -39.9 12.6 -396.2 5.3 1.7 0.3
looic se_looic
MarathonTimes_Mod2 712.6 24.0
MarathonTimes_Mod1 792.3 10.6
WAMBS checklist
When to Worry and How to Avoid Misuse of Bayesian Statistics
by Laurent Smeets and Rens van der Schoot
Before estimating the model:
Do you understand the priors?
After estimation before inspecting results:
Does the trace-plot exhibit convergence?
Does convergence remain after doubling the number of iterations?
Does the posterior distribution histogram have enough information?
Do the chains exhibit a strong degree of autocorrelation?
Do the posterior distributions make substantive sense?
Understanding the exact influence of the priors
Do different specification of the multivariate variance priors influence the results?
Is there a notable effect of the prior when compared with non-informative priors?
Are the results stable from a sensitivity analysis?
Is the Bayesian way of interpreting and reporting model results used?
Tutorial source: https://www.rensvandeschoot.com/brms-wambs/ Alternatives exist as well like the BARG framework (Kruschke, J.K. Bayesian Analysis Reporting Guidelines. Nat Hum Behav 5, 1282–1291 (2021). https://doi.org/10.1038/s41562-021-01177-7)
WAMBS Template to use
File called WAMBS_workflow_MarathonData.qmd (quarto document)
Click here for the Quarto version
Create your own project and project folder
Copy the template and rename it
We will go through the different parts in the slide show
You can apply/adapt the code in the template
To render the document properly with references, you also need the references.bib file
Preparations for applying it to Marathon model
Packages needed:
library (here)
library (tidyverse)
library (brms)
library (bayesplot)
library (ggmcmc)
library (patchwork)
library (priorsense)
Preparations for applying it to Marathon model
Load the dataset and the model:
load (
file = here ("Presentations" , "MarathonData.RData" )
)
MarathonTimes_Mod2 <-
readRDS (file =
here ("Presentations" ,
"Output" ,
"MarathonTimes_Mod2.RDS" )
)
Focus on the priors before estimation
Remember: priors come in many disguises
Uninformative/Weakly informative
When objectivity is crucial and you want let the data speak for itself…
Informative
When including significant information is crucial
previously collected data
results from former research/analyses
data of another source
theoretical considerations
elicitation
brms defaults
Weakly informative priors
If dataset is big, impact of priors is minimal
But, always better to know what you are doing!
Complex models might run into convergence issues \(\rightarrow\) specifying more informative priors might help!
So, how to deviate from the defaults?
Check priors used by brms
Function: get_prior( )
Remember our model 2 for Marathon Times:
\[\begin{aligned}
& \text{MarathonTimeM}_i \sim N(\mu,\sigma_e)\\
& \mu = \beta_0 + \beta_1*\text{km4week}_i + \beta_2*\text{sp4week}_i
\end{aligned}\]
get_prior (
MarathonTimeM ~ 1 + km4week + sp4week,
data = MarathonData
)
Check priors used by brms
prior: type of prior distribution
class: parameter class (with b being population-effects)
coef: name of the coefficient within parameter class
group: grouping factor for group-level parameters (when using mixed effects models)
resp : name of the response variable when using multivariate models
lb & ub: lower and upper bound for parameter restriction
Visualizing priors
The best way to make sense of the priors used is visualizing them!
Many options:
See WAMBS template!
There we demonstrate the use of ggplot2, metRology, ggtext and patchwork to visualize the priors.
Visualizing priors
library (metRology)
library (ggplot2)
library (ggtext)
library (patchwork)
# Setting a plotting theme
theme_set (theme_linedraw () +
theme (text = element_text (family = "Times" , size = 8 ),
panel.grid = element_blank (),
plot.title = element_markdown ())
)
# Generate the plot for the prior of the Intercept (mu)
Prior_mu <- ggplot ( ) +
stat_function (
fun = dt.scaled, # We use the dt.scaled function of metRology
args = list (df = 3 , mean = 199.2 , sd = 24.9 ), #
xlim = c (120 ,300 )
) +
scale_y_continuous (name = "density" ) +
labs (title = "Prior for the intercept" ,
subtitle = "student_t(3,199.2,24.9)" )
# Generate the plot for the prior of the error variance (sigma)
Prior_sigma <- ggplot ( ) +
stat_function (
fun = dt.scaled, # We use the dt.scaled function of metRology
args = list (df = 3 , mean = 0 , sd = 24.9 ), #
xlim = c (0 ,6 )
) +
scale_y_continuous (name = "density" ) +
labs (title = "Prior for the residual variance" ,
subtitle = "student_t(3,0,24.9)" )
# Generate the plot for the prior of the effects of independent variables
Prior_betas <- ggplot ( ) +
stat_function (
fun = dnorm, # We use the normal distribution
args = list (mean = 0 , sd = 10 ), #
xlim = c (- 20 ,20 )
) +
scale_y_continuous (name = "density" ) +
labs (title = "Prior for the effects of independent variables" ,
subtitle = "N(0,10)" )
Prior_mu + Prior_sigma + Prior_betas +
plot_layout (ncol = 3 )
Probability density plots for the different priors used in the example model
Understanding priors… another example
Experimental study (pretest - posttest design) with 3 conditions:
control group;
experimental group 1;
experimental group 2.
Model:
\[\begin{aligned}
& Posttest_{i} \sim N(\mu,\sigma_{e_{i}})\\
& \mu = \beta_0 + \beta_1*\text{Pretest}_{i} + \beta_2*\text{Exp_cond1}_{i} + \beta_3*\text{Exp_cond2}_{i}
\end{aligned}\]
Our job: coming up with priors that reflect that we expect both conditions to have a positive effect (hypothesis based on literature) and no indications that one experimental condition will outperform the other.
Understanding priors… another example
Assuming pre- and posttest scores are standardized
Assuming no increase between pre- and posttest in control condition
Understanding priors… another example
Assuming a strong correlation between pre- and posttest
Understanding priors… another example
Assuming a small effect of experimental conditions
No difference between both experimental conditions
Remember Cohen’s d: 0.2 = small effect size; 0.5 = medium effect size; 0.8 or higher = large effect size
Setting custom priors in brms
Setting our custom priors can be done with set_prior( ) command
E.g., change the priors for the beta’s (effects of km4week and sp4week):
Custom_priors <-
c (
set_prior (
"normal(0,10)" ,
class = "b" ,
coef = "km4week" ),
set_prior (
"normal(0,10)" ,
class = "b" ,
coef = "sp4week" )
)
Prior Predictive Check
Did you set sensible priors?
Simulate data based on the model and the priors
Visualize the simulated data and compare with real data
Check if the plot shows impossible simulated datasets
Prior Predictive Check in brms
Step 1: Fit the model with custom priors with option sample_prior="only"
Fit_Model_priors <-
brm (
MarathonTimeM ~ 1 + km4week + sp4week,
data = MarathonData,
prior = Custom_priors,
backend = "cmdstanr" ,
cores = 4 ,
sample_prior = "only"
)
Prior Predictive Check in brms
Step 2: visualize the data with the pp_check( ) function
set.seed (1975 )
pp_check (
Fit_Model_priors,
ndraws = 300 ) # number of simulated datasets you wish for
Check some summary statistics
How are summary statistics of simulated datasets (e.g., median, min, max, …) distributed over the datasets?
How does that compare to our real data?
Use type = "stat" argument within pp_check()
pp_check (Fit_Model_priors,
type = "stat" ,
stat = "median" )
Using all posterior draws for ppc type 'stat' by default.
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Focus on convergence of the model (before interpreting the model!)
Does the trace-plot exhibits convergence?
Create custom trace-plots (aka caterpillar plots) with ggs( ) function from ggmcmc package
Model_chains <- ggs (MarathonTimes_Mod2)
Model_chains %>%
filter (Parameter %in% c (
"b_Intercept" ,
"b_km4week" ,
"b_sp4week" ,
"sigma"
)
) %>%
ggplot (aes (
x = Iteration,
y = value,
col = as.factor (Chain)))+
geom_line () +
facet_grid (Parameter ~ .,
scale = 'free_y' ,
switch = 'y' ) +
labs (title = "Caterpillar Plots for the parameters" ,
col = "Chains" )
Caterpillar plots for the parameters in the model
Does convergence remain after doubling the number of iterations?
Re-fit the model with more iterations
Check trace-plots again
First consider the need to do this! If you have a complex model that already took a long time to run, this check will take at least twice as much time…
Your Turn
Your data and model
Do the first checks on the model convergence
R-hat statistics
Sampling of parameters done by:
multiple chains
multiple iterations within chains
If variance between chains is big \(\rightarrow\) NO CONVERGENCE
R-hat (\(\widehat{R}\) ) : compares the between- and within-chain estimates for model parameters
R-hat statistics
\(\widehat{R}\) < 1.015 for each parameter estimate
at least 4 chains are recommended
Effective Sample Size (ESS) > 400 to rely on \(\widehat{R}\)
R-hat in brms
mcmc_rhat() function from the bayesplot package
mcmc_rhat (
brms:: rhat (MarathonTimes_Mod2),
size = 3
)+
yaxis_text (hjust = 1 ) # to print parameter names
Autocorrelation
Sampling of parameter values are not independent!
So there is autocorrelation
But you don’t want too much impact of autocorrelation
2 approaches to check this:
ratio of the effective sample size to the total sample size
plot degree of autocorrelation
Ratio effective sample size / total sample size
Should be higher than 0.1 (Gelman et al., 2013)
Visualize making use of the mcmc_neff( ) function from bayesplot
mcmc_neff (
neff_ratio (MarathonTimes_Mod2)
) +
yaxis_text (hjust = 1 ) # to print parameter names
Plot degree of autocorrelation
Visualize making use of the mcmc_acf( ) function
mcmc_acf (
as.array (MarathonTimes_Mod2),
regex = "b" ) # to plot only the parameters starting with b (our beta's)
Rank order plots
additional way to assess the convergence of MCMC
if the algorithm converged, plots of all chains look similar
mcmc_rank_hist (
MarathonTimes_Mod2,
regex = "b" # only intercept and beta's
)
Focus on the Posterior
Does the posterior distribution histogram have enough information?
Plotting the posterior distribution histogram
Step 1: create a new object with ‘draws’ based on the final model
posterior_PD <- as_draws_df (MarathonTimes_Mod2)
Plotting the posterior distribution histogram
Step 2: create histogram making use of that object
post_intercept <-
posterior_PD %>%
select (b_Intercept) %>%
ggplot (aes (x = b_Intercept)) +
geom_histogram () +
ggtitle ("Intercept" )
Warning: Dropping 'draws_df' class as required metadata was removed.
post_km4week <-
posterior_PD %>%
select (b_km4week) %>%
ggplot (aes (x = b_km4week)) +
geom_histogram () +
ggtitle ("Beta km4week" )
Warning: Dropping 'draws_df' class as required metadata was removed.
post_sp4week <-
posterior_PD %>%
select (b_sp4week) %>%
ggplot (aes (x = b_sp4week)) +
geom_histogram () +
ggtitle ("Beta sp4week" )
Warning: Dropping 'draws_df' class as required metadata was removed.
Plotting the posterior distribution histogram
Step 3: print the plot making use of patchwork ’s workflow to combine plots
post_intercept + post_km4week + post_sp4week +
plot_layout (ncol = 3 )
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Posterior Predictive Check
Generate data based on the posterior probability distribution
Create plot of distribution of y-values in these simulated datasets
Overlay with distribution of observed data
using pp_check() again, now with our model
pp_check (MarathonTimes_Mod2,
ndraws = 100 )
Posterior Predictive Check
We can also focus on some summary statistics (like we did with prior predictive checks as well)
pp_check (MarathonTimes_Mod2,
ndraws = 300 ,
type = "stat" ,
stat = "median" )
Prior sensibility analyses
Why prior sensibility analyses?
Often we rely on ‘arbitrary’ chosen (default) weakly informative priors
What is the influence of the prior (and the likelihood) on our results?
You could ad hoc set new priors and re-run the analyses and compare (a lot of work, without strict sytematical guidelines)
Semi-automated checks can be done with priorsense package
Using the priorsense package
Recently a package dedicated to prior sensibility analyses is launched
# install.packages("remotes")
remotes:: install_github ("n-kall/priorsense" )
Key-idea: power-scaling (both prior and likelihood)
background reading:
YouTube talk:
Basic table with indices
First check is done by using the powerscale_sensitivity( ) function
column prior contains info on sensibility for prior (should be lower than 0.05)
column likelihood contains info on sensibility for likelihood (that we want to be high, ‘let our data speak’)
column diagnosis is a verbalization of potential problem (- if none)
powerscale_sensitivity (MarathonTimes_Mod2)
Sensitivity based on cjs_dist:
# A tibble: 4 × 4
variable prior likelihood diagnosis
<chr> <dbl> <dbl> <chr>
1 b_Intercept 0.000853 0.0851 -
2 b_km4week 0.000512 0.0802 -
3 b_sp4week 0.000370 0.0831 -
4 sigma 0.00571 0.151 -
Visualization of prior sensibility
powerscale_plot_dens (
powerscale_sequence (
MarathonTimes_Mod2
),
variable = c (
"b_Intercept" ,
"b_km4week" ,
"b_sp4week"
)
)
Visualization of prior sensibility
powerscale_plot_quantities (
powerscale_sequence (
MarathonTimes_Mod2
),
variable = c (
"b_km4week"
)
)
Back to top